g
y
p
g
oach can speed up the multiple sequence comparison dramatically
equences.
(a) (b)
a) The CPU time comparison between the alignment-based approach using the
-Wunsch algorithm and the alignment-free approaches using the k-mer word
ibrary approach for sequence comparison. The horizontal axis stands for the
ength. (b) The accuracy comparison between the alignment-based approach
Needleman-Wunsch algorithm and the alignment-free approach, i.e., the k-mer
ency library approach for sequence comparison. ‘rho’ stands for the correlation
and ‘p’ is the correlation test p value.
he accuracy comparison
the alignment-free approach is accurate was also examined. It
o investigate whether the distance measurement between
s of the alignment-free approach was correlated with the
t distance of the alignment-based approach. One hundred pairs of
ucleotide sequences were randomly generated. The mutation rate
ween 0.5% and 5%. Every mutation rate was repeated for 500
herefore, there were 50,000 trials in total. Both the alignment-
proach and the alignment-free approach were applied for each
ndom pseudo nucleotide sequences. First, the distance percentage
ned as the ratio of the alignment distance over the alignment
he correlation between the distance percentage and the distance
mer word frequencies was tested. Figure 7.14(b) shows the result.
seen that the correlation coefficient was greater than 0.927 and